2017-05-19

You will learn in this session

  • how to handle temporal and spatial autocorrelation
  • that GLMM are not very difficult if you already know GLM and LMM
  • that random effects as well can have non Gaussian distribution
  • that there are even more general methods than GLMM out there: HGLM and DHGLM
  • how to handle heteroscedasticity

Temporal autocorrelation

Temporal autocorrelation in discrete (equaly spaced) time steps

The AirPassengers data

AirPassengers
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432

The AirPassengers data

plot(AirPassengers)

Reformating the dataset for the fit

air <- data.frame(passengers = as.numeric(AirPassengers),
                  year = rep(1949:1960, each = 12),
                  month = factor(rep(1:12, 12)))
air
##     passengers year month
## 1          112 1949     1
## 2          118 1949     2
## 3          132 1949     3
## 4          129 1949     4
## 5          121 1949     5
## 6          135 1949     6
## 7          148 1949     7
## 8          148 1949     8
## 9          136 1949     9
## 10         119 1949    10
## 11         104 1949    11
## 12         118 1949    12
## 13         115 1950     1
## 14         126 1950     2
## 15         141 1950     3
## 16         135 1950     4
## 17         125 1950     5
## 18         149 1950     6
## 19         170 1950     7
## 20         170 1950     8
## 21         158 1950     9
## 22         133 1950    10
## 23         114 1950    11
## 24         140 1950    12
## 25         145 1951     1
## 26         150 1951     2
## 27         178 1951     3
## 28         163 1951     4
## 29         172 1951     5
## 30         178 1951     6
## 31         199 1951     7
## 32         199 1951     8
## 33         184 1951     9
## 34         162 1951    10
## 35         146 1951    11
## 36         166 1951    12
## 37         171 1952     1
## 38         180 1952     2
## 39         193 1952     3
## 40         181 1952     4
## 41         183 1952     5
## 42         218 1952     6
## 43         230 1952     7
## 44         242 1952     8
## 45         209 1952     9
## 46         191 1952    10
## 47         172 1952    11
## 48         194 1952    12
## 49         196 1953     1
## 50         196 1953     2
## 51         236 1953     3
## 52         235 1953     4
## 53         229 1953     5
## 54         243 1953     6
## 55         264 1953     7
## 56         272 1953     8
## 57         237 1953     9
## 58         211 1953    10
## 59         180 1953    11
## 60         201 1953    12
## 61         204 1954     1
## 62         188 1954     2
## 63         235 1954     3
## 64         227 1954     4
## 65         234 1954     5
## 66         264 1954     6
## 67         302 1954     7
## 68         293 1954     8
## 69         259 1954     9
## 70         229 1954    10
## 71         203 1954    11
## 72         229 1954    12
## 73         242 1955     1
## 74         233 1955     2
## 75         267 1955     3
## 76         269 1955     4
## 77         270 1955     5
## 78         315 1955     6
## 79         364 1955     7
## 80         347 1955     8
## 81         312 1955     9
## 82         274 1955    10
## 83         237 1955    11
## 84         278 1955    12
## 85         284 1956     1
## 86         277 1956     2
## 87         317 1956     3
## 88         313 1956     4
## 89         318 1956     5
## 90         374 1956     6
## 91         413 1956     7
## 92         405 1956     8
## 93         355 1956     9
## 94         306 1956    10
## 95         271 1956    11
## 96         306 1956    12
## 97         315 1957     1
## 98         301 1957     2
## 99         356 1957     3
## 100        348 1957     4
## 101        355 1957     5
## 102        422 1957     6
## 103        465 1957     7
## 104        467 1957     8
## 105        404 1957     9
## 106        347 1957    10
## 107        305 1957    11
## 108        336 1957    12
## 109        340 1958     1
## 110        318 1958     2
## 111        362 1958     3
## 112        348 1958     4
## 113        363 1958     5
## 114        435 1958     6
## 115        491 1958     7
## 116        505 1958     8
## 117        404 1958     9
## 118        359 1958    10
## 119        310 1958    11
## 120        337 1958    12
## 121        360 1959     1
## 122        342 1959     2
## 123        406 1959     3
## 124        396 1959     4
## 125        420 1959     5
## 126        472 1959     6
## 127        548 1959     7
## 128        559 1959     8
## 129        463 1959     9
## 130        407 1959    10
## 131        362 1959    11
## 132        405 1959    12
## 133        417 1960     1
## 134        391 1960     2
## 135        419 1960     3
## 136        461 1960     4
## 137        472 1960     5
## 138        535 1960     6
## 139        622 1960     7
## 140        606 1960     8
## 141        508 1960     9
## 142        461 1960    10
## 143        390 1960    11
## 144        432 1960    12

Looking at the average trend per year

plot(with(air, tapply(passengers, year, mean)) ~ I(1949:1960),
     ylab = "Mean number of passengers", xlab = "Year", type = "b")

Looking at the average trend per month

plot(with(air, tapply(passengers, month, mean)) ~ I(1:12),
     ylab = "Mean number of passengers", xlab = "Month", type = "b")

Simple fit

(mod_air <- lm(passengers ~ year + month, data = air))
## 
## Call:
## lm(formula = passengers ~ year + month, data = air)
## 
## Coefficients:
## (Intercept)         year       month2       month3       month4       month5       month6  
##  -62153.612       31.924       -6.750       28.417       25.333       30.083       69.917  
##      month7       month8       month9      month10      month11      month12  
##     109.583      109.333       60.667       24.833       -8.917       20.083

The problem

plot(residuals(mod_air), type = "b")
abline(h = 0, lty = 2, col = "red")

The problem

lmtest::dwtest(mod_air)
## 
##  Durbin-Watson test
## 
## data:  mod_air
## DW = 0.45024, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0

The problem

acf(residuals(mod_air))

Solution

library(nlme)
MAR1 <- corAR1(value = 0.5, form = ~ 1|year, fixed = FALSE)
MAR1 <- Initialize(MAR1, data = air)
round(corMatrix(MAR1)[["1950"]], 2)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,] 1.00 0.50 0.25 0.13 0.06 0.03 0.02 0.01 0.00  0.00  0.00  0.00
##  [2,] 0.50 1.00 0.50 0.25 0.13 0.06 0.03 0.02 0.01  0.00  0.00  0.00
##  [3,] 0.25 0.50 1.00 0.50 0.25 0.13 0.06 0.03 0.02  0.01  0.00  0.00
##  [4,] 0.13 0.25 0.50 1.00 0.50 0.25 0.13 0.06 0.03  0.02  0.01  0.00
##  [5,] 0.06 0.13 0.25 0.50 1.00 0.50 0.25 0.13 0.06  0.03  0.02  0.01
##  [6,] 0.03 0.06 0.13 0.25 0.50 1.00 0.50 0.25 0.13  0.06  0.03  0.02
##  [7,] 0.02 0.03 0.06 0.13 0.25 0.50 1.00 0.50 0.25  0.13  0.06  0.03
##  [8,] 0.01 0.02 0.03 0.06 0.13 0.25 0.50 1.00 0.50  0.25  0.13  0.06
##  [9,] 0.00 0.01 0.02 0.03 0.06 0.13 0.25 0.50 1.00  0.50  0.25  0.13
## [10,] 0.00 0.00 0.01 0.02 0.03 0.06 0.13 0.25 0.50  1.00  0.50  0.25
## [11,] 0.00 0.00 0.00 0.01 0.02 0.03 0.06 0.13 0.25  0.50  1.00  0.50
## [12,] 0.00 0.00 0.00 0.00 0.01 0.02 0.03 0.06 0.13  0.25  0.50  1.00

Solution

(mod_air2 <- lme(passengers ~ month + year, random = ~ 1 | year, data = air,
                 correlation = MAR1, method = "REML"))
## Linear mixed-effects model fit by REML
##   Data: air 
##   Log-restricted-likelihood: -577.6168
##   Fixed: passengers ~ month + year 
##   (Intercept)        month2        month3        month4        month5        month6        month7 
## -59618.503779     -6.750000     28.416667     25.333333     30.083333     69.916667    109.583333 
##        month8        month9       month10       month11       month12          year 
##    109.333333     60.666667     24.833333     -8.916667     20.083333     30.626889 
## 
## Random effects:
##  Formula: ~1 | year
##         (Intercept) Residual
## StdDev: 0.003269751 25.54159
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | year 
##  Parameter estimate(s):
##       Phi 
## 0.7528233 
## Number of Observations: 144
## Number of Groups: 12

Alternative code

(mod_air2b <- lme(passengers ~ month + year, random = ~ 1 | year, data = air,
                 correlation = corAR1(form = ~ 1|year), method = "REML"))
## Linear mixed-effects model fit by REML
##   Data: air 
##   Log-restricted-likelihood: -577.6168
##   Fixed: passengers ~ month + year 
##   (Intercept)        month2        month3        month4        month5        month6        month7 
## -59618.551838     -6.750000     28.416667     25.333333     30.083333     69.916667    109.583333 
##        month8        month9       month10       month11       month12          year 
##    109.333333     60.666667     24.833333     -8.916667     20.083333     30.626913 
## 
## Random effects:
##  Formula: ~1 | year
##         (Intercept) Residual
## StdDev: 0.002476719 25.54137
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | year 
##  Parameter estimate(s):
##      Phi 
## 0.752818 
## Number of Observations: 144
## Number of Groups: 12

Testing the temporal autocorrelation

mod_air3 <- lme(passengers ~ month + year, random = ~ 1 | year, data = air, method = "REML")
anova(mod_air2, mod_air3)
##          Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## mod_air2     1 16 1187.234 1233.237 -577.6168                        
## mod_air3     2 15 1281.208 1324.336 -625.6039 1 vs 2 95.97428  <.0001

Alternative autocorrelation structures

mod_airARMA1 <- update(mod_air2, correlation = corARMA(form = ~ 1 | year, p = 1, q = 0))
mod_airARMA2 <- update(mod_air2, correlation = corARMA(form = ~ 1 | year, p = 6, q = 0))
mod_airARMA3 <- update(mod_air2, correlation = corARMA(form = ~ 1 | year, p = 2, q = 2))

rbind(mod_air2 = AIC(mod_air2),
      mod_airARMA1 = AIC(mod_airARMA1),
      mod_airARMA2 = AIC(mod_airARMA2),
      mod_airARMA3 = AIC(mod_airARMA3))
##                  [,1]
## mod_air2     1187.234
## mod_airARMA1 1187.234
## mod_airARMA2 1151.080
## mod_airARMA3 1159.012


Note: do not compare AICs or likelihoods from nlme to those from other packages!

(it seems they have failed to consider a constant term…)

Fitted values

mod_air4 <- update(mod_air2, correlation = corARMA(form = ~ 1 | year, p = 6, q = 0), method = "ML")
data.for.plot <- expand.grid(month = factor(1:12), year = 1949:1960)
data.for.plot$obs <- air$passengers
data.for.plot$time <- seq(1949, 1960, length = (1960 - 1949 + 1) * 12)
data.for.plot$fit_lm <- predict(mod_air)
data.for.plot$fit_lme <- predict(mod_air4)

Fitted values

plot(obs ~ time, data = data.for.plot, type = "l", ylim = c(0, 700), ylab = "Passengers")
points(fit_lm ~ time, data = data.for.plot, type = "l", col = "red")
points(fit_lme ~ time, data = data.for.plot, type = "l", col = "blue")

Better, but good enough?

plot(residuals(mod_air4), type = "l")
abline(h = 0, lty = 2, col = "red")

Temporal autocorrelation in continuous time

The nlme::BodyWeight dataset

data("BodyWeight", package = "nlme")
plot(BodyWeight)

The nlme::BodyWeight dataset

body <- as.data.frame(BodyWeight)
body$Rat <- factor(body$Rat, levels = 1:16, order = FALSE)
str(body)
## 'data.frame':    176 obs. of  4 variables:
##  $ weight: num  240 250 255 260 262 258 266 266 265 272 ...
##  $ Time  : num  1 8 15 22 29 36 43 44 50 57 ...
##  $ Rat   : Factor w/ 16 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Diet  : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
unique(body$Time)
##  [1]  1  8 15 22 29 36 43 44 50 57 64

Fitting the model

(mod_rat1 <- lme(weight ~ Diet * Time, random = ~ Time|Rat, data = body))
## Linear mixed-effects model fit by REML
##   Data: body 
##   Log-restricted-likelihood: -575.8599
##   Fixed: weight ~ Diet * Time 
## (Intercept)       Diet2       Diet3        Time  Diet2:Time  Diet3:Time 
## 251.6516516 200.6654865 252.0716778   0.3596391   0.6058392   0.2983375 
## 
## Random effects:
##  Formula: ~Time | Rat
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev     Corr  
## (Intercept) 36.9390723 (Intr)
## Time         0.2484113 -0.149
## Residual     4.4436052       
## 
## Number of Observations: 176
## Number of Groups: 16

Checking residuals

plot(mod_rat1) ## there is some homoscedasticity but we will ignore it for now

Checking residuals

plot(residuals(mod_rat1), type = "b")

Fitting continuous temporal autocorrelation

(mod_rat2 <- lme(weight ~ Diet * Time, random = ~ Time|Rat, correlation = corExp(form = ~ Time), data = body))
## Linear mixed-effects model fit by REML
##   Data: body 
##   Log-restricted-likelihood: -566.0296
##   Fixed: weight ~ Diet * Time 
## (Intercept)       Diet2       Diet3        Time  Diet2:Time  Diet3:Time 
## 251.5924493 200.6889085 252.3151949   0.3602961   0.6255175   0.3109453 
## 
## Random effects:
##  Formula: ~Time | Rat
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev     Corr  
## (Intercept) 36.9746835 (Intr)
## Time         0.2434623 -0.149
## Residual     4.5501115       
## 
## Correlation Structure: Exponential spatial correlation
##  Formula: ~Time | Rat 
##  Parameter estimate(s):
##    range 
## 3.650438 
## Number of Observations: 176
## Number of Groups: 16

Model comparison

anova(mod_rat1, mod_rat2)
##          Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## mod_rat1     1 10 1171.720 1203.078 -575.8599                        
## mod_rat2     2 11 1154.059 1188.553 -566.0296 1 vs 2 19.66057  <.0001


Note: the comparison makes sense as models are nested and fitted with REML.

Alternative correlation structures

mod_rat3 <- update(mod_rat1, corr = corExp(form = ~ Time, nugget = TRUE))
mod_rat4 <- update(mod_rat1, corr = corRatio(form = ~ Time))
mod_rat5 <- update(mod_rat1, corr = corSpher(form = ~ Time))
mod_rat6 <- update(mod_rat1, corr = corLin(form = ~ Time))
mod_rat7 <- update(mod_rat1, corr = corGaus(form = ~ Time))

rbind(mod_rat2 = AIC(mod_rat2),
      mod_rat3 = AIC(mod_rat3),
      mod_rat4 = AIC(mod_rat4),
      mod_rat5 = AIC(mod_rat5),
      mod_rat6 = AIC(mod_rat6),
      mod_rat7 = AIC(mod_rat7))
##              [,1]
## mod_rat2 1154.059
## mod_rat3 1151.768
## mod_rat4 1155.749
## mod_rat5 1157.233
## mod_rat6 1157.233
## mod_rat7 1157.233

The best model

mod_rat3
## Linear mixed-effects model fit by REML
##   Data: body 
##   Log-restricted-likelihood: -563.8841
##   Fixed: weight ~ Diet * Time 
## (Intercept)       Diet2       Diet3        Time  Diet2:Time  Diet3:Time 
## 251.3005207 200.6582307 253.3068385   0.3623183   0.6421429   0.3063663 
## 
## Random effects:
##  Formula: ~Time | Rat
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev     Corr  
## (Intercept) 36.9431391 (Intr)
## Time         0.2300954 -0.162
## Residual     5.7590445       
## 
## Correlation Structure: Exponential spatial correlation
##  Formula: ~Time | Rat 
##  Parameter estimate(s):
##     range    nugget 
## 17.150214  0.175613 
## Number of Observations: 176
## Number of Groups: 16

Same fit with spaMM

(mod_rat_spaMM <- corrHLfit(weight ~ Diet * Time + (Time|Rat) + Matern(1|Time), data = body,
                            HLmethod = "REML", init.corrHLfit = list(Nugget = 0.1), ranFix = list(nu = 0.5)))
## formula: weight ~ Diet * Time + (Time | Rat) + Matern(1 | Time)
## REML: Estimation of lambda, phi, Nugget and rho by REML.
##       Estimation of fixed effects by ML.
## Estimation of Nugget and rho by 'outer' REML, maximizing p_bv.
## Family: gaussian ( link = identity )
##  ------------ Fixed effects (beta) ------------
##             Estimate Cond. SE t-value
## (Intercept) 251.6517 13.13261  19.162
## Diet2       200.6655 22.67994   8.848
## Diet3       252.0717 22.67994  11.114
## Time          0.3596  0.09474   3.796
## Diet2:Time    0.6058  0.15786   3.838
## Diet3:Time    0.2983  0.15786   1.890
##  --------------- Random effects ---------------
## Family: gaussian ( link = identity )
## Correlation parameters: [nu was fixed]
##         nu     Nugget        rho 
## 0.50000000 0.02743695 0.19805116 
## Coefficients for log[ lambda = var(u) ]:
##  Group        Term Estimate Cond.SE    Var.   Corr.
##    Rat (Intercept)       NA      NA    1366        
##    Rat        Time       NA      NA 0.06238 -0.1506
##   Time (Intercept)    1.024  0.5539                
## Estimate of lambda ( Time ):  2.784 
## # of obs: 176; # of groups: Rat, 32; Time, 11 
##  ------------- Residual variance  -------------
## Coefficients for log(phi) ~ 1 :
##             Estimate Cond. SE
## (Intercept)    2.831   0.1203
## Estimate of phi=residual var:  16.96 
##  ------------- Likelihood values  -------------
##                         logLik
## p_v(h) (marginal L): -578.1822
##   p_beta,v(h) (ReL): -570.7141

Fitted values: nlme vs spaMM

plot(predict(mod_rat_spaMM), predict(mod_rat3))
abline(0, 1, col = "red")

Better fit with spaMM?

mod_rat_spaMM2 <- corrHLfit(weight ~ Diet * Time + (Time|Rat) + Matern(1|Time), data = body,
                            HLmethod = "REML", init.corrHLfit = list(Nugget = 0))

mod_rat_spaMM3 <- corrHLfit(weight ~ Diet * Time + (Time|Rat) + Matern(1|Time), data = body,
                            HLmethod = "REML", ranFix = list(Nugget = 0, nu = 0.5))

Comparison between exponential and unconstrained Matern

print(AIC(mod_rat_spaMM))
##        marginal AIC:     conditional AIC:      dispersion AIC:        effective df: 
##            1178.3643            1042.2442            1151.4282             138.8841
print(AIC(mod_rat_spaMM2))
##        marginal AIC:     conditional AIC:      dispersion AIC:        effective df: 
##            1178.3643            1042.9976            1151.4282             138.5074
print(AIC(mod_rat_spaMM3))
##        marginal AIC:     conditional AIC:      dispersion AIC:        effective df: 
##            1178.3643            1043.4118            1151.4282             138.3003

Displaying the continuous correlation function

time.pred <- seq(0, 64, 0.1)
corr <- MaternCorr(time.pred, rho = mod_rat_spaMM$corrPars$rho, nu = 0.5, Nugget = mod_rat_spaMM$corrPars$Nugget)
plot(corr ~ time.pred, xlab = "Time (days)", ylab = "Correlation", type = "l")

Testing the overall effect of diet

spaMM

mod_rat_spaMM3ML <- corrHLfit(weight ~ Diet * Time + (Time|Rat) + Matern(1|Time), data = body,
                            HLmethod = "ML", ranFix = list(Nugget = 0, nu = 0.5))
mod_rat_no_diet <- corrHLfit(weight ~ 1 + Time + (Time|Rat) + Matern(1|Time), data = body,
                            HLmethod = "ML", ranFix = list(Nugget = 0, nu = 0.5))
anova(mod_rat_spaMM3ML, mod_rat_no_diet)
##      chi2_LR df     p_value
## p_v 47.79655  4 1.04059e-09
c(logLik(mod_rat_spaMM3ML), logLik(mod_rat_no_diet))
##       p_v       p_v 
## -577.8746 -601.7728

Testing the overall effect of diet

nlme

mod_rat3ML <- lme(weight ~ Diet * Time, random = ~ Time|Rat,
                  correlation = corExp(form = ~ Time, nugget = TRUE), data = body, method = "ML")

mod_rat_no_diet2 <- lme(weight ~ 1 + Time, random = ~ Time|Rat,
                        correlation = corExp(form = ~ Time, nugget = TRUE), data = body, method = "ML")
anova(mod_rat3ML, mod_rat_no_diet2)
##                  Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## mod_rat3ML           1 12 1165.962 1204.007 -570.9808                        
## mod_rat_no_diet2     2  8 1206.618 1231.982 -595.3088 1 vs 2 48.65601  <.0001

Revisiting the airplane passengers with spaMM::fitme()

air$time <- seq(1949, 1960, length = (1960 - 1949 + 1) * 12)

mod_air_spaMM1 <- fitme(passengers ~ month + year + Matern(1|time), data = air, method = "REML")

mod_air_spaMM2 <- fitme(passengers ~ month + year + Matern(1|time), data = air, method = "REML",
                        init = list(Nugget = 0))

mod_air_spaMM3 <- fitme(passengers ~ month + year + Matern(1|time), data = air, method = "REML",
                        init = list(Nugget = 0), fixed = list(nu = 0.5))

mod_air_spaMM4 <- fitme(passengers ~ month + year + Matern(1|time), data = air, method = "REML",
                        fixed = list(nu = 0.5, Nugget = 0))

Models comparison

print(AIC(mod_air_spaMM1))
##        marginal AIC:     conditional AIC:      dispersion AIC:        effective df: 
##         1222.0364901          371.1144890         1134.9616788            0.3123916
print(AIC(mod_air_spaMM2))
##        marginal AIC:     conditional AIC:      dispersion AIC:        effective df: 
##             1217.811                  Inf             1131.229                 -Inf
print(AIC(mod_air_spaMM3))
##        marginal AIC:     conditional AIC:      dispersion AIC:        effective df: 
##             1237.312                  Inf             1148.376                 -Inf
print(AIC(mod_air_spaMM4))
##        marginal AIC:     conditional AIC:      dispersion AIC:        effective df: 
##         1.237279e+03        -9.149563e+02         1.148377e+03         1.786804e-03

Examining the best model

mod_air_spaMM2
## formula: passengers ~ month + year + Matern(1 | time)
## REML: Estimation of lambda, phi, Nugget, nu and rho by REML.
##       Estimation of fixed effects by ML.
## Estimation of lambda, phi, Nugget, nu and rho by 'outer' REML, maximizing p_bv.
## Family: gaussian ( link = identity )
##  ------------ Fixed effects (beta) ------------
##               Estimate Cond. SE  t-value
## (Intercept) -61700.003 2638.899 -23.3810
## month2          -6.340    4.964  -1.2771
## month3          28.928    7.497   3.8587
## month4          25.793    9.300   2.7733
## month5          30.481   10.281   2.9649
## month6          70.284   10.702   6.5674
## month7         109.951   10.814  10.1671
## month8         109.747   10.715  10.2426
## month9          61.240   10.318   5.9355
## month10         25.828    9.380   2.7534
## month11         -7.071    7.640  -0.9255
## month12         23.149    5.226   4.4291
## year            31.692    1.350  23.4769
##  --------------- Random effects ---------------
## Family: gaussian ( link = identity )
## Correlation parameters:
##      Nugget          nu         rho 
##  0.08687389 16.63041149 38.13178822 
## Outer estimate of lambda ( time ):  735.5 
## # of obs: 144; # of groups: time, 144 
##  ------------- Residual variance  -------------
## phi estimate was 1e-06 
##  ------------- Likelihood values  -------------
##                         logLik
## p_v(h) (marginal L): -593.9057
##   p_beta,v(h) (ReL): -563.6147

Fitted values

data.for.plot$pred_spaMM <- predict(mod_air_spaMM2)
plot(obs ~ time, data = data.for.plot, type = "l", lwd = 3, ylim = c(0, 700), ylab = "Passengers")
points(pred_spaMM ~ time, data = data.for.plot, type = "l", col = "green")

Note: never extrapolate using such model! The perfect fit is not unusual.

Testing the effect of years

spaMM

mod_air_spaMM2ML <- fitme(passengers ~ month + year + Matern(1|time), data = air, method = "ML",
                        init = list(Nugget = 0))

mod_air_no_year <- fitme(passengers ~ month + Matern(1|time), data = air, method = "ML",
                        init = list(Nugget = 0))
anova(mod_air_spaMM2ML, mod_air_no_year)
##      chi2_LR df      p_value
## p_v 39.62925  1 3.070495e-10
c(logLik(mod_air_spaMM2ML), logLik(mod_air_no_year))
##       p_v       p_v 
## -593.5887 -613.4034

Testing the effect of years

nlme

mod_air3ML <- lme(passengers ~ month + year, random = ~ 1 | year, data = air,
                 correlation = corARMA(p = 6, q = 0), method = "ML")

mod_air_no_year2 <- lme(passengers ~ month, random = ~ 1 | year, data = air,
                 correlation = corARMA(p = 6, q = 0), method = "ML")
anova(mod_air3ML, mod_air_no_year2)
##                  Model df      AIC      BIC    logLik   Test L.Ratio p-value
## mod_air3ML           1 21 1209.245 1271.611 -583.6224                       
## mod_air_no_year2     2 20 1258.970 1318.367 -609.4851 1 vs 2 51.7254  <.0001

Spatial autocorrelation

Maximum normalised-difference vegetation index in north Cameroon

data("Loaloa")
ndvi <- Loaloa[, c("maxNDVI", "latitude", "longitude")]
head(ndvi)
##   maxNDVI latitude longitude
## 1    0.69 5.736750  8.041860
## 2    0.74 5.680280  8.004330
## 3    0.79 5.347222  8.905556
## 4    0.67 5.917420  8.100720
## 5    0.85 5.104540  8.182510
## 6    0.80 5.355556  8.929167

Visualising the data

library(maps)
spaMMplot2D(x = ndvi$longitude, y = ndvi$latitude, z = ndvi$maxNDVI, add.map = TRUE,
            xlab = "Longitude", ylab = "Latitude", plot.title = title(main = "max NDVI"))

Visualising the data

pairs(ndvi)

Fitting the model

(mod_ndvi1 <- fitme(maxNDVI ~ 1 + Matern(1|longitude + latitude), data = ndvi, method = "REML"))
## formula: maxNDVI ~ 1 + Matern(1 | longitude + latitude)
## REML: Estimation of lambda, phi, nu and rho by REML.
##       Estimation of fixed effects by ML.
## Estimation of lambda, phi, nu and rho by 'outer' REML, maximizing p_bv.
## Family: gaussian ( link = identity )
##  ------------ Fixed effects (beta) ------------
##             Estimate Cond. SE t-value
## (Intercept)   0.7734   0.2661   2.906
##  --------------- Random effects ---------------
## Family: gaussian ( link = identity )
## Correlation parameters:
##        nu       rho 
## 0.6405954 0.1254427 
## Outer estimate of lambda ( longitude. ):  0.1009 
## # of obs: 197; # of groups: longitude., 197 
##  ------------- Residual variance  -------------
## phi estimate was 0.000442359 
##  ------------- Likelihood values  -------------
##                        logLik
## p_v(h) (marginal L): 370.2011
##   p_beta,v(h) (ReL): 369.7961

Predictions

mapMM(mod_ndvi1, add.map = TRUE, plot.title = title(xlab = "Longitude", ylab = "Latitude"))

Predictions

filled.mapMM(mod_ndvi1, add.map = TRUE, plot.title = title(xlab = "Longitude", ylab = "Latitude"))

Prediction uncertainty

x.for.pred <- seq(min(ndvi$longitude), max(ndvi$longitude), length.out = 100)
y.for.pred <- seq(min(ndvi$latitude), max(ndvi$latitude), length.out = 50)
data.for.pred <- expand.grid(longitude = x.for.pred, latitude = y.for.pred)
gridpred <- predict(mod_ndvi1, newdata = data.for.pred, variances = list(predVar = TRUE))
data.for.pred$predVar <- attr(gridpred, "predVar")
m <- matrix(data.for.pred$predVar, ncol = length(y.for.pred))

Prediction uncertainty

spaMM.filled.contour(x = x.for.pred, y = y.for.pred, z = m, plot.axes = {
  points(ndvi[, c("longitude", "latitude")])}, col = spaMM.colors(redshift = 3))

Non gaussian response

GLMM

GLM + LMM = GLMM

\[\begin{array}{lcl} \mu &=& g^{-1}(\eta)\\ \mu &=& g^{-1}(\mathbf{X}\beta + \mathbf{Z}b)\\ \end{array} \]

with (as for GLM):

  • \(\text{E}(\text{Y}) = \mu = g^{-1}(\eta)\)
  • \(\text{Var}(\text{Y}) = \phi\text{V}(\mu)\)


Note:

  • If \(g^{-1}\) is the identity function, \(\phi = \sigma^2\) and \(\text{V}(\mu) = 1\), we have the LMM.
  • If \(\mathbf{Z}b = 0\), we have the GLM.
  • If \(g^{-1}\) is the identity function, \(\phi = \sigma^2\), \(\text{V}(\mu) = 1\), and \(\mathbf{Z}b = 0\), we have the LM.

The LM2GLMM::Flatwork dataset

Flatwork
##    individual gender month shopping cleaning other absent
## 1           1      w   nov        4        8     3      0
## 2           2      w   nov       10        5    10      7
## 3           3      w   nov        3        1     2      4
## 4           4      w   nov       15        2     0      0
## 5           5      w   nov       15        9     4      0
## 6           6      m   nov        0        2     8      0
## 7           7      w   nov        5        6     6     10
## 8           8      m   nov        3        4     5     11
## 9           9      m   nov        5        7     6      4
## 10         10      m   nov        6        3     0      7
## 11          1      w   dec        7        6     3      1
## 12          2      w   dec        6        2     3      7
## 13          3      w   dec        5        7     1      5
## 14          4      w   dec       14        5     1      0
## 15          5      w   dec        5        5     1      0
## 16          6      m   dec        6        1     0      5
## 17          7      w   dec        3        1     0      1
## 18          8      m   dec        2        1     0      0
## 19          9      m   dec        4        0     0      5
## 20         10      m   dec        7        3     1      3
## 21          1      w   jan       11        3     3      1
## 22          2      w   jan        3        8     2      7
## 23          3      w   jan        7        7    12      4
## 24          4      w   jan        5        2     5      0
## 25          5      w   jan        6        6     0      0
## 26          6      m   jan       13        0     5      2
## 27          7      w   jan        2       16     6      6
## 28          8      m   jan        7        6     9      0
## 29          9      m   jan        0        4     2      5
## 30         10      m   jan        6        3     9      7
## 31          1      w   feb        8        3     1      7
## 32          2      w   feb        2        4     1      6
## 33          3      w   feb        1        7    16      1
## 34          4      w   feb        0        0     0      0
## 35          5      w   feb       12       10     6      0
## 36          6      m   feb        7        0     1      0
## 37          7      w   feb        3        2    10      6
## 38          8      m   feb        7        4     3      0
## 39          9      m   feb        0        0     3      7
## 40         10      m   feb       16        2     0      0
## 41          1      w   mar        0        0     0      0
## 42          2      w   mar       17        2     5     11
## 43          3      w   mar        0        6     3      3
## 44          4      w   mar       11        5     6     12
## 45          5      w   mar        8       15    12      0
## 46          6      m   mar       10        0    14      0
## 47          7      w   mar        6        5     6      7
## 48          8      m   mar        8        6     6      0
## 49          9      m   mar       11        0     0      8
## 50         10      m   mar       10        2     1      0
## 51          1      w   apr        4        2     0      0
## 52          2      w   apr        9        9    12      0
## 53          3      w   apr       19        3     2      3
## 54          4      w   apr       14        4     6      6
## 55          5      w   apr        3        1     5      0
## 56          6      m   apr        4        1     0      0
## 57          7      w   apr       10       13     8      0
## 58          8      m   apr        0        2     2     10
## 59          9      m   apr        1        6     0      0
## 60         10      m   apr        4        4     1      0

The LM2GLMM::Flatwork dataset

str(Flatwork)
## 'data.frame':    60 obs. of  7 variables:
##  $ individual: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender    : Factor w/ 2 levels "m","w": 2 2 2 2 2 1 2 1 1 1 ...
##  $ month     : Factor w/ 6 levels "apr","dec","feb",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ shopping  : int  4 10 3 15 15 0 5 3 5 6 ...
##  $ cleaning  : int  8 5 1 2 9 2 6 4 7 3 ...
##  $ other     : int  3 10 2 0 4 8 6 5 6 0 ...
##  $ absent    : int  0 7 4 0 0 0 10 11 4 7 ...

GLMM with lme4

(mod_glmm_lme4 <- glmer(shopping ~ gender + (1|individual) + (1|month), family = poisson(),
                        data = Flatwork))
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: poisson  ( log )
## Formula: shopping ~ gender + (1 | individual) + (1 | month)
##    Data: Flatwork
##       AIC       BIC    logLik  deviance  df.resid 
##  421.7239  430.1012 -206.8619  413.7239        56 
## Random effects:
##  Groups     Name        Std.Dev.
##  individual (Intercept) 0.2261  
##  month      (Intercept) 0.0510  
## Number of obs: 60, groups:  individual, 10; month, 6
## Fixed Effects:
## (Intercept)      genderw  
##      1.7107       0.2159

GLMM with spaMM

(mod_glmm_spaMM <- fitme(shopping ~ gender + (1|individual) + (1|month), family = poisson(),
                        data = Flatwork, method = "ML"))
## formula: shopping ~ gender + (1 | individual) + (1 | month)
## Estimation of lambda by ML approximation (p_v).
## Estimation of fixed effects by ML approximation (p_v).
## Estimation of lambda by 'outer' ML, maximizing p_v.
## Family: poisson ( link = log )
##  ------------ Fixed effects (beta) ------------
##             Estimate Cond. SE t-value
## (Intercept)   1.7107   0.1440  11.879
## genderw       0.2159   0.1813   1.191
##  --------------- Random effects ---------------
## Family: gaussian ( link = identity )
## Outer estimate of lambda ( individual ):  0.05113
## Outer estimate of lambda ( month ):  0.002602 
## # of obs: 60; # of groups: individual, 10; month, 6 
##  ------------- Likelihood values  -------------
##                         logLik
## p_v(h) (marginal L): -206.8619

Checking residuals

library(DHARMa)
r <- simulateResiduals(mod_glmm_lme4)
plot(r)

Extra 0s?

barplot(table(Flatwork$shopping))

Extra 0s?

testZeroInflation(r)

## 
##  DHARMa zero-inflation test via comparison to expected zeros with simulation under H0 =
##  fitted model
## 
## data:  r
## ratioObsExp = 24.306, p-value < 2.2e-16
## alternative hypothesis: more

Binomial model

Flatwork$shopping_bin <- Flatwork$shopping > 0
(mod_glmm_lme4bin <- glmer(shopping_bin ~ gender + (1|individual) + (1|month), family = binomial(),
                        data = Flatwork))
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: shopping_bin ~ gender + (1 | individual) + (1 | month)
##    Data: Flatwork
##      AIC      BIC   logLik deviance df.resid 
##  50.2791  58.6565 -21.1396  42.2791       56 
## Random effects:
##  Groups     Name        Std.Dev. 
##  individual (Intercept) 8.887e-09
##  month      (Intercept) 4.127e-08
## Number of obs: 60, groups:  individual, 10; month, 6
## Fixed Effects:
## (Intercept)      genderw  
##      1.6094       0.7885

Checking residuals

r_bin <- simulateResiduals(mod_glmm_lme4bin)
plot(r_bin)

Overdispersion?

r_bin2 <- simulateResiduals(mod_glmm_lme4bin, refit = TRUE)  ## slow and convergence issues...
testOverdispersion(r_bin2)
## 
##  DHARMa nonparametric overdispersion test via comparison to simulation under H0 = fitted
##  model
## 
## data:  r_bin2
## dispersion = NA, p-value = 0.08434
## alternative hypothesis: overdispersion

Testing the gender effect

mod_glmm_lme4bin0 <- glmer(shopping_bin ~ 1 + (1|individual) + (1|month), family = binomial(),
                        data = Flatwork)

anova(mod_glmm_lme4bin, mod_glmm_lme4bin0)
## Data: Flatwork
## Models:
## mod_glmm_lme4bin0: shopping_bin ~ 1 + (1 | individual) + (1 | month)
## mod_glmm_lme4bin: shopping_bin ~ gender + (1 | individual) + (1 | month)
##                   Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## mod_glmm_lme4bin0  3 49.228 55.511 -21.614   43.228                         
## mod_glmm_lme4bin   4 50.279 58.657 -21.140   42.279 0.9485      1     0.3301

Same with spaMM

mod_glmm_spaMMbin <- fitme(shopping_bin ~ gender + (1|individual) + (1|month), family = binomial(),
                        data = Flatwork)

mod_glmm_spaMMbin0 <- fitme(shopping_bin ~ 1 + (1|individual) + (1|month), family = binomial(),
                        data = Flatwork)

anova(mod_glmm_spaMMbin, mod_glmm_spaMMbin0)
##       chi2_LR df   p_value
## p_v 0.9485333  1 0.3300929

Is there an effect for the non-zeros?

This is not ideal, but we will try an analysis on the truncated distribution…

Flatwork_pos <- subset(Flatwork, Flatwork$shopping_bin)
barplot(table(Flatwork_pos$shopping))

Fitting models on truncated distributions

mod_glmm_lme4pos1 <- glmer(shopping ~ gender + (1|individual) + (1|month), family = poisson(),
                        data = Flatwork_pos)
mod_glmm_lme4pos2 <- glmer.nb(shopping ~ gender + (1|individual) + (1|month), data = Flatwork_pos)
mod_glmm_spaMMpos1 <- fitme(shopping ~ gender + (1|individual) + (1|month), family = poisson(),
                        data = Flatwork_pos)
mod_glmm_spaMMpos2 <- fitme(shopping ~ gender + (1|individual) + (1|month), family = negbin(),
                        data = Flatwork_pos)
mod_glmm_spaMMpos3 <- fitme(shopping ~ gender + (1|individual) + (1|month), family = COMPoisson(),
                        data = Flatwork_pos)

c(AIC(mod_glmm_lme4pos1), AIC(mod_glmm_lme4pos2))
## [1] 328.9853 306.3538
print(c(AIC(mod_glmm_spaMMpos1), AIC(mod_glmm_spaMMpos2), AIC(mod_glmm_spaMMpos3)))
##        marginal AIC:     conditional AIC:      dispersion AIC:        effective df: 
##            328.98531            320.77083            324.98531             44.54738 
##        marginal AIC:     conditional AIC:      dispersion AIC:        effective df: 
##            304.35383            300.35388            300.35383             52.99970 
##        marginal AIC:     conditional AIC:      dispersion AIC:        effective df: 
##            305.69867            301.69886            301.69867             52.99809

Checking residuals

r_pos <- simulateResiduals(mod_glmm_lme4pos2)
plot(r_pos)

Testing again the gender effect

mod_glmm_spaMMpos20 <- fitme(shopping ~ 1 + (1|individual) + (1|month), family = negbin(),
                        data = Flatwork_pos)

anova(mod_glmm_spaMMpos20, mod_glmm_spaMMpos2)
##       chi2_LR df   p_value
## p_v 0.4453586  1 0.5045474

Practice


What about gender and cleaning?

Non gaussian random effects

The Hierarchical GLM (HGLM)

\[\begin{array}{lcl} \mu &=& g^{-1}(\eta)\\ \mu &=& g^{-1}(\mathbf{X}\beta + \mathbf{Z}b)\\ \mu &=& g^{-1}(\mathbf{X}\beta + \mathbf{Z}f(u)) \end{array} \]


Note:

  • If \(f(u)\) is the identity function and \(u\) is drawn for a normal distribution, then we have the GLMM, a particular case of the more general HGLM.
  • Hence LM, GLM, LMM and GLMM are all particular cases of the HGLM.

The example of the negative binomial

library(MASS)

mod_negbin <- glm.nb(Days ~ Sex/Age, data = quine)

quine$index <- factor(1:nrow(quine))

mod_poiss_gamma <- fitme(Days ~ Sex/Age + (1|index), data = quine,
                         family = poisson(), rand.family = Gamma("log"))

rbind(mod_negbin$coefficients, mod_poiss_gamma$fixef)
##      (Intercept)       SexM SexF:AgeF1 SexM:AgeF1  SexF:AgeF2 SexM:AgeF2 SexF:AgeF3 SexM:AgeF3
## [1,]    2.928524 -0.3957609 -0.3659809 -0.5868525 -0.01502935  0.6211936 -0.2894662  0.7709794
## [2,]    2.928524 -0.3957609 -0.3659809 -0.5868525 -0.01502935  0.6211936 -0.2894662  0.7709794


Note: the equivalence is expected! In more complex models differences may appear.

The beta-binomial HGLM


\[ \begin{array}{lcl} \text{logit}(p) &=& \text{ln}\left(\frac{p}{1-p}\right) = \mathbf{X}\beta + \mathbf{Z}b\\ \text{logit}(b) &=& \text{ln}\left(\frac{u}{1-u}\right) \end{array} \]


with \(u\) following the beta distribution.


It can be useful to model heterogeneity in proportions!

The spaMM::seeds dataset

data(seeds)
seeds
##    plate seed  extract  r  n
## 1      1  O75     Bean 10 39
## 2      2  O75     Bean 23 62
## 3      3  O75     Bean 23 81
## 4      4  O75     Bean 26 51
## 5      5  O75     Bean 17 39
## 6      6  O73     Bean  8 16
## 7      7  O73     Bean 10 30
## 8      8  O73     Bean  8 28
## 9      9  O73     Bean 23 45
## 10    10  O73     Bean  0  4
## 11    11  O75 Cucumber  5  6
## 12    12  O75 Cucumber 53 74
## 13    13  O75 Cucumber 55 72
## 14    14  O75 Cucumber 32 51
## 15    15  O75 Cucumber 46 79
## 16    16  O75 Cucumber 10 13
## 17    17  O73 Cucumber  3 12
## 18    18  O73 Cucumber 22 41
## 19    19  O73 Cucumber 15 30
## 20    20  O73 Cucumber 32 51
## 21    21  O73 Cucumber  3  7

The spaMM::seeds dataset

coplot(r/n ~ plate | seed + extract, data = seeds)

Fitting the germination data using the HGLM

(mod_germ1 <- fitme(cbind(r, n - r) ~ seed * extract + (1|plate), family = binomial(), rand.family = Beta(),
                   data = seeds, method = "REML"))
## formula: cbind(r, n - r) ~ seed * extract + (1 | plate)
## Estimation of lambda by REML approximation (p_bv).
## Estimation of fixed effects by ML approximation (p_v).
## Estimation of lambda by 'outer' REML, maximizing p_bv.
## Family: binomial ( link = logit )
##  ------------ Fixed effects (beta) ------------
##                         Estimate Cond. SE t-value
## (Intercept)             -0.54828   0.1903 -2.8818
## seedO73                  0.07625   0.3079  0.2476
## extractCucumber          1.35307   0.2697  5.0165
## seedO73:extractCucumber -0.83192   0.4292 -1.9383
##  --------------- Random effects ---------------
## Family: beta ( link = logit )
## Outer estimate of lambda ( plate ):  0.02409 
## # of obs: 21; # of groups: plate, 21 
##  ------------- Likelihood values  -------------
##                         logLik
## p_v(h) (marginal L): -54.07093
##   p_beta,v(h) (ReL): -56.59749

Comparison to the binomial GLMM

mod_germ2 <- fitme(cbind(r, n - r) ~ seed * extract + (1|plate), family = binomial(),
                   data = seeds, method = "REML")

print(rbind(AIC(mod_germ1), AIC(mod_germ2)))
##             marginal AIC:     conditional AIC:      dispersion AIC:        effective df:
## [1,]             118.1419             112.5289             115.1950            10.068957
## [2,]             118.0799             112.5009             115.0612             9.932794


Ok… here it does not do much difference, but it is still worth trying.

Heteroscedasticity

Let's revisit the rats

mod_rat_spaMM <- corrHLfit(weight ~ Diet * Time + (Time|Rat) + Matern(1|Time), data = body,
                            HLmethod = "REML", init.corrHLfit = list(Nugget = 0.1), ranFix = list(nu = 0.5))
coplot(residuals(mod_rat_spaMM) ~ I(1:nrow(body)) | body$Diet, show.given = FALSE)

Let's revisit the rats

mod_rat_hetero <- corrHLfit(weight ~ Diet * Time + (Time|Rat) + Matern(1|Time), data = body,
                           HLmethod = "REML", init.corrHLfit = list(Nugget = 0.1), ranFix = list(nu = 0.5),
                           resid.formula = ~ Diet)
summary.tables <- summary(mod_rat_hetero)
summary.tables$phi_table
##               Estimate  Cond. SE
## (Intercept)  2.6398121 0.1704731
## Diet2       -0.2003986 0.2965613
## Diet3        0.7120606 0.2924082
print(rbind(AIC(mod_rat_spaMM),
            AIC(mod_rat_hetero)))
##             marginal AIC:     conditional AIC:      dispersion AIC:        effective df:
## [1,]             1178.364             1042.244             1151.428             138.8841
## [2,]             1173.472             1032.337             1146.540             138.2339

Let's re-test the overal effect of the diet

mod_rat_hetero <- corrHLfit(weight ~ Diet * Time + (Time|Rat) + Matern(1|Time), data = body,
                           HLmethod = "ML", init.corrHLfit = list(Nugget = 0.1), ranFix = list(nu = 0.5),
                           resid.formula = ~ Diet)

mod_rat_hetero0 <- corrHLfit(weight ~ Time + (Time|Rat) + Matern(1|Time), data = body,
                           HLmethod = "ML", init.corrHLfit = list(Nugget = 0.1), ranFix = list(nu = 0.5),
                           resid.formula = ~ Diet)

anova(mod_rat_hetero, mod_rat_hetero0)
##      chi2_LR df      p_value
## p_v 47.77297  4 1.052435e-09

You can handle heteroscedasticity in simple models too!

set.seed(1L)
d <- data.frame(y = c(rnorm(100, mean = 10, sd = sqrt(10)),
                      rnorm(100, mean = 20, sd = sqrt(20))),
                group = factor(c(rep("A", 100), rep("B", 100))))

m <- fitme(y ~ group, resid.model = ~ group, data = d, method = "REML")
unique(m$phi)
## [1]  8.067621 18.350646

An example of many difficulties combined: IsoriX

What is IsoriX?

library(IsoriX)
## 
##  IsoriX version 0.5 is loaded!
## 
##  Many functions and objects have changed names since the version 0.4.
##  This is to make IsoriX more intuitive for you to use.
##  We will do our best to limit changes in names in the future!!
## 
##  Type:
##     * ?IsoriX for a short description.
##     * browseVignettes(package='IsoriX') for tutorials.
##     * news(package='IsoriX') for news.

Loading the GNIP data

data(GNIPdata)
dim(GNIPdata)
## [1] 62040     7
head(GNIPdata)
##     lat  long elev isoscape.value year month stationID
## 1 78.91 11.93    7          -72.8 1990     1    100400
## 2 78.91 11.93    7          -67.6 1990     2    100400
## 3 78.91 11.93    7         -107.0 1990     3    100400
## 4 78.91 11.93    7          -77.8 1990     4    100400
## 5 78.91 11.93    7          -77.1 1990     5    100400
## 6 78.91 11.93    7          -67.3 1990     6    100400

Crop and aggregate the GNIP data for Europe

GNIPdataEU <- queryGNIP(data = GNIPdata, long.min = -30, long.max = 60,
                        lat.min = 30, lat.max = 70)

dim(GNIPdataEU)
## [1] 322   7
head(GNIPdataEU)
##   stationID isoscape.value var.isoscape.value n.isoscape.value   lat  long elev
## 1    142700      -46.19083           195.0560              109 58.10  6.57   13
## 2    206000     -122.47571           919.5862               70 68.68 21.53  403
## 3    284500     -104.19625           618.5105               80 66.49 25.75  107
## 4    291701     -101.16667           877.2270               69 62.89 27.62  116
## 5    297401      -83.32869           559.0186              122 60.18 24.83   30
## 6    304401      -50.63818           139.2143               22 58.15 -4.97   73

IsoriX using IsoriX

Europefit <- isofit(iso.data = GNIPdataEU, mean.model.fix = list(elev = TRUE, lat.abs = TRUE))
data(elevraster)
elevationraster <- relevate(elevation.raster = elevraster, manual.crop = c(-30, 60, 30, 70))
## class       : RasterLayer 
## dimensions  : 48, 108, 5184  (nrow, ncol, ncell)
## resolution  : 0.8333333, 0.8333333  (x, y)
## extent      : -30.00014, 59.99986, 29.83319, 69.83319  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : in memory
## names       : elevationrastersmall.asc 
## values      : -27.1915, 2499.111  (min, max)

The data for the elevation

library(rasterVis)
plot(elevationraster)

IsoriX using IsoriX

isoscape <- isoscape(elevation.raster = elevationraster, isofit = Europefit)

data(countries)
data(oceanmask)
data(isopalette1)

plot.mean <- plot(x = isoscape, which = "mean", borders = list(borders = countries),
    mask = list(mask = oceanmask), palette = isopalette1, plot = FALSE)

plot.disp <- plot(x = isoscape, which = "disp", borders = list(borders = countries),
    mask = list(mask = oceanmask), palette = isopalette1, plot = FALSE)

Mean prediction of the distribution of Deuterium

plot.mean

Prediction of the residual variance in Deuterium

plot.disp

IsoriX using spaMM

disp.fit <- fitme(var.isoscape.value ~ 1 + Matern(1 | long + lat), family = Gamma(log),
                  prior.weights = n.isoscape.value - 1, method = "REML", fixed = list(phi = 2),
                  control.dist = list(dist.method = "Earth"), data = GNIPdataEU)
GNIPdataEU$pred.disp <- predict(disp.fit)[, 1]
mean.fit <- fitme(isoscape.value ~ lat + elev + Matern(1 | long + lat), family = gaussian(), 
                  prior.weights = n.isoscape.value, method = "REML",
                  resid.model = list(formula = ~ 0 + offset(pred.disp), family = Gamma(identity)),
                  control.dist = list(dist.method = "Earth"), data = GNIPdataEU)
Europefit2 <- list(mean.fit = mean.fit, disp.fit = disp.fit)

Predictions

isoscape2 <- isoscape(elevation.raster = elevationraster, isofit = Europefit2)
plot(x = isoscape2, which = "mean", borders = list(borders = countries),
    mask = list(mask = oceanmask), palette = isopalette1, plot = TRUE)

DHGLM

GNIP <- subset(GNIPdata, GNIPdata$lat > 30 & GNIPdata$lat < 70 & GNIPdata$long > -30 & GNIPdata$long < 60)
dim(GNIP)
system.time(
  dhglm <- corrHLfit(isoscape.value ~ lat + elev + Matern(1 | long + lat), family = gaussian(), 
              HLmethod = "REML", data = GNIP, control.dist = list(dist.method = "Earth"),
              resid.model = list(formula = ~ 1 + Matern(1 | long + lat),
                                 control.dist = list(dist.method = "Earth"),
                                 family = Gamma(log), fixed = list(phi = 2)))
)

What you need to remember

  • how to handle temporal and spatial autocorrelation
  • that GLMM are not very difficult if you already know GLM and LMM
  • that random effects as well can have non Gaussian distribution
  • that there are even more general methods than GLMM out there: HGLM and DHGLM
  • how to handle heteroscedasticity

Table of content

Mixed-effects models